91 research outputs found

    Formal Context Generation using Dirichlet Distributions

    Full text link
    We suggest an improved way to randomly generate formal contexts based on Dirichlet distributions. For this purpose we investigate the predominant way to generate formal contexts, a coin-tossing model, recapitulate some of its shortcomings and examine its stochastic model. Building up on this we propose our Dirichlet model and develop an algorithm employing this idea. By comparing our generation model to a coin-tossing model we show that our approach is a significant improvement with respect to the variety of contexts generated. Finally, we outline a possible application in null model generation for formal contexts.Comment: 16 pages, 7 figure

    On the Usability of Probably Approximately Correct Implication Bases

    Full text link
    We revisit the notion of probably approximately correct implication bases from the literature and present a first formulation in the language of formal concept analysis, with the goal to investigate whether such bases represent a suitable substitute for exact implication bases in practical use-cases. To this end, we quantitatively examine the behavior of probably approximately correct implication bases on artificial and real-world data sets and compare their precision and recall with respect to their corresponding exact implication bases. Using a small example, we also provide qualitative insight that implications from probably approximately correct bases can still represent meaningful knowledge from a given data set.Comment: 17 pages, 8 figures; typos added, corrected x-label on graph

    An Incremental Learning Method to Support the Annotation of Workflows with Data-to-Data Relations

    Get PDF
    Workflow formalisations are often focused on the representation of a process with the primary objective to support execution. However, there are scenarios where what needs to be represented is the effect of the process on the data artefacts involved, for example when reasoning over the corresponding data policies. This can be achieved by annotating the workflow with the semantic relations that occur between these data artefacts. However, manually producing such annotations is difficult and time consuming. In this paper we introduce a method based on recommendations to support users in this task. Our approach is centred on an incremental rule association mining technique that allows to compensate the cold start problem due to the lack of a training set of annotated workflows. We discuss the implementation of a tool relying on this approach and how its application on an existing repository of workflows effectively enable the generation of such annotations

    Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

    Get PDF
    International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches

    Some Programming Optimizations for Computing Formal Concepts

    Get PDF
    This paper describes in detail some optimization approaches taken to improve the efficiency of computing formal concepts. In particular, it describes the use and manipulation of bit-arrays to represent FCA structures and carry out the typical operations undertaken in computing formal concepts, thus providing data structures that are both memoryefficient and time saving. The paper also examines the issues and compromises involved in computing and storing formal concepts, describing a number of data structures that illustrate the classical trade-off between memory footprint and code efficiency. Given that there has been limited publication of these programmatical aspects, these optimizations will be useful to programmers in this area and also to any programmers interested in optimizing software that implements Boolean data structures. The optimizations are shown to significantly increase performance by comparing an unoptimized implementation with the optimized one

    Making Use of Empty Intersections to Improve the Performance of CbO-Type Algorithms

    Get PDF
    This paper describes how improvements in the performance of Close-by-One type algorithms can be achieved by making use of empty intersections in the computation of formal concepts. During the computation, if the intersection between the current concept extent and the next attribute-extent is empty, this fact can be simply inherited by subsequent children of the current concept. Thus subsequent intersections with the same attribute-extent can be skipped. Because these intersections require the testing of each object in the current extent, significant time savings can be made by avoiding them. The paper also shows how further time savings can be made by forgoing the traditional canonicity test for new extents, if the intersection is empty. Finally, the paper describes how, because of typical optimizations made in the implementation of CbO-type algorithms, even more time can be saved by amalgamating inherited attributes with inherited empty intersections into a single, simple test

    On Coupling FCA and MDL in Pattern Mining

    Get PDF
    International audiencePattern Mining is a well-studied field in Data Mining and Machine Learning. The modern methods are based on dynamically updating models, among which MDL-based ones ensure high-quality pattern sets. Formal concepts also characterize patterns in a condensed form. In this paper we study MDL-based algorithm called Krimp in FCA settings and propose a modified version that benefits from FCA and relies on probabilistic assumptions that underlie MDL. We provide an experimental proof that the proposed approach improves quality of pattern sets generated by Krimp

    Creation and evolution of magnetic helicity

    Get PDF
    Projecting a non-Abelian SU(2) vacuum gauge field - a pure gauge constructed from the group element U - onto a fixed (electromagnetic) direction in isospace gives rise to a nontrivial magnetic field, with nonvanishing magnetic helicity, which coincides with the winding number of U. Although the helicity is not conserved under Maxwell (vacuum) evolution, it retains one-half its initial value at infinite time.Comment: Clarifying remarks and references added; 12 pages, 1 figure using BoxedEPSF, REVTeX macros; submitted to Phys Rev D; email to [email protected]

    Revisiting Pattern Structure Projections

    Get PDF
    International audienceFormal concept analysis (FCA) is a well-founded method for data analysis and has many applications in data mining. Pattern structures is an extension of FCA for dealing with complex data such as sequences or graphs. However the computational complexity of computing with pattern structures is high and projections of pattern structures were introduced for simplifying computation. In this paper we introduce o-projections of pattern structures, a generalization of projections which defines a wider class of projections preserving the properties of the original approach. Moreover, we show that o-projections form a semilattice and we discuss the correspondence between o-projections and the representation contexts of o-projected pattern structures

    DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups

    Get PDF
    We strive to find contexts (i.e., subgroups of entities) under which exceptional (dis-)agreement occurs among a group of individuals , in any type of data featuring individuals (e.g., parliamentarians , customers) performing observable actions (e.g., votes, ratings) on entities (e.g., legislative procedures, movies). To this end, we introduce the problem of discovering statistically significant exceptional contextual intra-group agreement patterns. To handle the sparsity inherent to voting and rating data, we use Krippendorff's Alpha measure for assessing the agreement among individuals. We devise a branch-and-bound algorithm , named DEvIANT, to discover such patterns. DEvIANT exploits both closure operators and tight optimistic estimates. We derive analytic approximations for the confidence intervals (CIs) associated with patterns for a computationally efficient significance assessment. We prove that these approximate CIs are nested along specialization of patterns. This allows to incorporate pruning properties in DEvIANT to quickly discard non-significant patterns. Empirical study on several datasets demonstrates the efficiency and the usefulness of DEvIANT. Technical Report Associated with the ECML/PKDD 2019 Paper entitled: "DEvIANT: Discovering Significant Exceptional (Dis-)Agreement Within Groups"
    • …
    corecore